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Abstract — Interior-point algorithms constitute a very interest- 
ing class of algorithms for solving linear-programming problems. 
In this paper we study efficient implementations of such algo- 
rithms for solving the linear program that appears in the linear- 
programming decoder formulation. 

I. Introduction 

Consider a binary linear code C of length n that is used 
for data transmission over a binary-input discrete memoryless 
channel. As was observed by Feldman et al. [1], [2], the ML 
decoder for this setup can be written as 

x M L = argmin (7,x), 

where 7 is a length-n vector that contains the log-likelihood 
ratios and where (7,x) is the inner product (in K) of the 
vector 7 with the vector x. Because the cost function in this 
minimization problem is linear, this is essentially equivalent 
to the solution of 

x ML- ar § min <7,x), 

xGconv(C) 

where conv(C) denotes the convex hull of C when C is 
embedded in R". (We say "essentially equivalent" because 
in the case where there is a unique optimal codeword then the 
two minimization problems yield the same solution. However, 
when there are multiple optimal codewords then xml and x^l 
are non-singlet sets and it holds that conv(xML) = x ML .) 

Because the above two optimization problems are usually 
practically intractable, Feldman et al. [1], [2] proposed to solve 
a relaxation of the above problem. Namely, for a code C that 
can be written as the intersection of m binary linear codes 
of length n, i.e., C = C\™ =l Cj, they introduced the so-called 
linear programming (LP) decoder 

x LP = argmin (7,x), 
with the relaxed polytope 



(1) 



V = P| conv(Cj) D conv(C) D C, 
3=1 



(2) 



"This is essentially the paper that appeared in the Proceedings of the 
2008 Information Theory and Applications Workshop, UC San Diego, CA, 
USA, January 27 - February 1, 2008. The only (major) change concerns the 
vector -y: it is now defined such that xml. x mL' anc ^ *-LP 3Ie solutions to 
minimization (and not maximization) problems. 



for which it can easily be shown that all codewords in C are 
vertices of V. 

The same polytope V appeared also in papers by Koetter 
and Vontobel [3], [4], [5], where message-passing iterative 
(MPI) decoders were analyzed and where this polytope V was 
called the fundamental polytope. The appearance of the same 
object in these two different contexts suggests that there is a 
tight connection between LP decoding and MPI decoding. 

The above codes Cj can be any codes of length n, however, 
in the following we will focus on the case where these codes 
are codes of dimension n — 1. For example, let H be an m x n 
parity-check matrix for the code C and let hj be the j-th row 
of H0 Then, defining 

C 3 = {x e {0, 1}" I (hj-.x) = (mod 2)} 

for j = 1, . . . , m, we obtain C = DjL^j. 

Of course, the reason why the decoder in ([TJ is called LP 
decoder is because the optimization problem in that equation 
is a linear program (LP)|3 There are two standard forms for 
LPs, namely 



and 



minimize (c, x) 
subj. to Ax = b 
x > 



maximize (b, A) 
subj. to A T A + s 
s > 



(3) 



(4) 



Any LP can be reformulated (by introducing suitable auxiliary 
variables, by reformulating equalities as two inequalities, etc.) 
so that it looks like the first standard form. Any LP can also 
be reformulated so that it looks like the second standard form. 
Moreover, the first and second standard form are tightly related 
in the sense that they are dual convex programs. Usually, the 
LP in (O is called the primal LP and the LP in (|4|i is called the 
dual LP. (As it is to be expected from the expression "duality," 
the primal LP is the dual of the dual LP.) 

Not unexpectedly, there are many ways to express the LP 
that appears in (HJ in either the first or the second standard 

'Note that in this paper all vectors are column vectors. 

2 We use LP to denote both "linear programming" and "linear program." 



form, and each of these reformulations has its advantages (and 
disadvantages). Once it is expressed in one of the standard 
forms, any general-purpose LP solver can basically be used to 
obtain the LP decoder output. However, the LP at hand has a 
lot of structure and one should take advantage of it in order to 
obtain very fast algorithms that can compete complexity- and 
time-wise with MPI decoders. 

Several ideas have been presented in the past in this direc- 
tion, e.g., by Feldman et al. [6] who briefly mentioned the 
use of sub-gradient methods for solving the LP of an early 
version of the LP decoder (namely for turbo-like codes), by 
Yang et al. [7], [8] on efficiently solvable variations of the 
LP decoder, by Taghavi and Siegel [9] on cutting-hyperplane- 
type approaches, by Vontobel and Koetter [10] on coordinate- 
ascent-type approaches, by Dimakis and Wainwright [11] 
and by Draper et al. [12] on improvements upon the LP 
decoder solution, and by Taghavi and Siegel [13] and by 
Wadayama [14] on using variations of LP decoding (together 
with efficient implementations) for intersymbol-interference 
channels. 

In this paper our focus will be on so-called interior-point 
algorithms, a type of LP solvers that has become popular with 
the seminal work of Karmarkar [15]. (After the publication 
of [15] in 1984, earlier work on interior-point-type algorithms 
by Dikin [16] and others became more widely known). We 
present some initial thoughts on how to use this class of 
algorithms in the context of LP decoding. So far, with the 
notable exception of [14], interior-point-type algorithms that 
are especially targeted to the LP in (Q~|) do not seem to have 
been considered. One of our goals by pursuing these type 
of methods is that we can potentially obtain algorithms that 
are better analyzable than MPI decoders, especially when it 
comes to finite-length codes. (Wadayama [14] discusses some 
efficient interior-point-type methods, however, he is trying to 
minimize a quadratic cost function, and the final solution is 
obtained through the use of the sum-product algorithm that 
is initialized by the result of the interior-point search. Al- 
though [14] presents some very interesting approaches that are 
worthwhile pursuing, it is not quite clear if these algorithms 
are better analyzable than MPI decoders.) 

There are some interesting facts about interior-point-type 
algorithms that make them worthwhile study objects. First of 
all, there are variants for which one can prove polynomial-time 
convergence (even in the worst case, which is in contrast to the 
simplex algorithm). Secondly, we can round an intermediate 
result to the next vector with only 0/^/1 entries and 
check if it is a codeword]^ (This is very similar to the 
stopping criterion that is used for MPI algorithms.) Note that 
a similar approach will probably not work well for simplex- 
type algorithms that typically wander from vertex to vertex 
of the fundamental polytope. The reason is that rounding the 
coordinates of a vertex yields only a codeword if the vertex 

3 To be precise, by rounding we mean that coordinates below ^ are mapped 
to 0, that coordinates above ^ are mapped to 1, and that coordinates equal 
to ^ are mapped to i. 



was a codeword^ Thirdly, interior-point-type algorithms are 
also interesting because they are less sensitive than simplex- 
type algorithms to degenerate vertices of the feasible region; 
this is important because the fundamental polytope has many 
degenerate vertices. 

The present paper is structured as follows. In Sees. HT1 and Hill 
we discuss two classes of interior-point algorithms, namely 
affine-scaling algorithms and primal-dual interior-point algo- 
rithms, respectively. As we will see, the bottleneck step of 
the algorithms in these two sections is to repeatedly find 
the solution to a certain type of system of linear equations. 
Therefore, we will address this issue, and efficient solutions to 
it, in Sec.UV] Finally, we briefly mention some approaches for 
potential algorithm simplifications in Sec. [V] and we conclude 
the paper in Sec. [VI] 

II. Affine- Scaling Algorithms 

An interesting class of interior-point-type algorithms are 
so-called affine scaling algorithms which were introduced 
by Dikin [16] and re-invented many times afterwards. Good 
introductions to this class of algorithms can be found in [17], 
[18]. 

Fig. Q] gives an intuitive picture of the workings of one 
instance of an affine-scaling algorithm. Consider the LP in (f3]) 
and assume that the set of all feasible points, i.e., the set of all 
x such that Ax = b and x > 0, is a triangle. For the vector 
c shown in Fig. Q] the optimal solution will be the vertex in 
the lower left part. The algorithm works as follows: 

1) Select an initial point that is in the interior of the set of 
all feasible points, cf. Fig. 02b), and let the current point 
be equal to this initial point. 

2) Minimizing (c, x) over the triangle is difficult (in fact, 
it is the problem we are trying to solve); therefore, 
we replace the triangle constraint by an ellipsoidal 
constraint that is centered around the current point. Such 
an ellipsoid is shown in Fig. [TJ C )- Its skewness depends 
on the closeness to the different boundaries. 

3) We then minimize the function (c, x) over this ellipsoid. 
The difference vector between this minimizing point 
and the center of the ellipsoid (see the little vector in 
Fig. |TJd)) points in the direction in which the next step 
will be taken. 

4) Depending on what strategy is pursued, a shorter or a 
longer step in the above-found direction is taken. This 
results in a new current point. (Whatever step size is 
taken, we always impose the constraint that the step size 
is such that the new current point lies in the interior of 
the set of feasible points.) 

5) If the current point is "close enough" to some vertex then 
stop, otherwise go to Step 2. ("Closeness" is determined 
according to some criterion.) 

4 Proof: in an LDPC code where all checks have degree at least two, the 
largest coordinate of any nonzero- vector vertex is at least ~ ■ Therefore, there 
is no nonzero-vector vertex that is rounded to the all-zero codeword. The 
proof is finished by using the symmetry of the fundamental polytope, i.e., the 
fact that the fundamental polytope "looks" the same from any codeword. 




(i) (j) 
Fig. 1. Some iterations of the affine-scaling algorithm. (See text for details.) 



Not surprisingly, when short (long) steps are taken in Step 4, 
the resulting algorithm is called the short-step (long-step) 
affine-scaling algorithm. Convergence proofs for different 
cases can be found in [19], [20], [21]. 

Of course, an affine-scaling algorithm can also be for- 
mulated for the LP in (0]). Moreover, instead of the above- 
described discrete-time version, one can easily come up with 
a continuous-time version, see e.g. [22]. The latter type of 
algorithms might actually be interesting for decoders that are 
implemented in analog VLSI. 

The bottleneck step in the affine-scaling algorithm is to find 
the new direction, which amounts to solving a system of linear 
equations of the form Pu = v, where P is a given (iteration- 
dependent) positive definite matrix, v is a given vector, and u 
is the direction vector that needs to be found. We will comment 
on efficient approaches for solving such systems of equations 
in Sec. [TV] 

III. Primal-Dual Interior Point Algorithms 

In contrast to affine-scaling algorithms, which either work 
only with the primal LP or only with the dual LP, primal- 
dual interior point algorithms - as the name suggests - work 
simultaneously on obtaining a primal and a dual optimal 
solution. A very readable and detailed introduction to this 
topic can be found in [23]. As in the case of the affine-scaling 
algorithm there are many different variations: short-step, long- 
step, predictor-corrector, path-following, etc. 

Again, the bottleneck step is to find the solution to a system 
of linear equations Pu = v, where P is a given (iteration- 
dependent) positive definite matrix, v is a given (iteration- 
dependent) vector, and u is the sought quantity. We will 
comment in Sec. [lV]on how such systems of linear equations 
can be solved efficiently. 

A variant that is worthwhile to be mentioned is the class 
of so-called infeasible-interior-point algorithms. The reason is 
that very often it is easy to find an initial primal feasible point 
or it is easy to find an initial dual feasible point but not both 
at the same time. Therefore, one starts the algorithm with a 
primal/dual point pair where the primal and/or the dual point 
are infeasible points; the algorithm then tries to decrease the 
amount of "infeasibility" (a quantity that we will not define 
here) at every iteration, besides of course optimizing the cost 
function. 

One of the most intriguing aspects of primal-dual interior- 
point algorithms is the polynomial-time worst-case bounds that 
can be stated. Of course, these bounds say mostly something 
about the behavior when the algorithm is already close to 
the solution vertex. It remains to be seen if these results 
are useful for implementations of the LP decoder where it is 
desirable that the initial iterations are as aggressive as possible 
and where the behavior close to a vertex is not that crucial. 
(We remind the reader of the rounding-procedure that was 
discussed at the end of Sec. [J a procedure that took advantage 
of some special properties of the fundamental polytope.) 



IV. Efficient Approaches for Solving Pu = v 
where P is a Positive Definite Matrix 

In Sees, [n] and [III] we saw that the crucial part in the 
discussed algorithms was to repeatedly and efficiently solve 
a system of linear equations that looks like 

Pu = v, 

where P is an iteration-dependent positive definite matrix and 
where v is an iteration-dependent vector. The fact that P is 
positive definite helps because u can also be seen to be the 
solution of the quadratic unconstrained optimization problem 

minimize iu T Pu — (v, u) (5) 

subj. to u e R h 

where we assumed that P is an h x /i-matrix. It is important to 
remark that for the algorithms in Sees. HI1 and [TTT1 the vector u 
usually does not have to be found perfectly. It is good enough 
to find an approximation of u that is close enough to the 
correct u. (For more details, see e.g. [18, Ch. 9].) 

Using a standard gradient-type algorithm to find u might 
work. However, the matrix P is often ill-conditioned, i.e., the 
ratio of the largest to the smallest eigenvalue can be quite big 
(especially towards the final iterations), and so the convergence 
speed of a gradient-type algorithm might suffer considerably. 

Therefore, more sophisticated approaches are desirable. 
Such an approach is the conjugate-gradient algorithm which 
was introduced by Hestenes and Stiefel [24]. (See Shewchuk's 
paper [25] for a very readable introduction to this topic and 
for some historical comments.) This method is especially 
attractive when P is sparse or when P can be written as a 
product of sparse matrices, the latter being the case for LP 
decoding of LDPC codes. 

In the context of the affine-scaling algorithm, e.g. Resende 
and Veiga [26] used the conjugate-gradient algorithm to effi- 
ciently solve the relevant equation systems. They also studied 
the behavior of the conjugate-gradient algorithm with different 
pre-conditioners. 

A quite different, yet interesting variant to solve the mini- 
mization problem in (0 is by using graphical models. Namely, 
one can represent the cost function in © by an additive 
factor graph [27], [28], [29]. Of course, there are a variety of 
factor graph representations for the this cost function, however, 
probably the most reasonable choice in the context of LP 
decoding is to choose the factor graph that looks topologically 
like the factor graph that is usually used for sum-product or 
min-sum algorithm decoding of LDPC codes. One can then 
try to find the solution with the help of the min-sum algorithm. 

[Equivalently, one can look at the maximization problem 

maximize exp ^— ju T Pu + (v, u)^ (6) 

subj. to u e R h . 

Here the function to be optimized is proportional to a Gaus- 
sian density and can be represented with a Gaussian factor 




(a) (b) 



Fig. 2. Replacement of a partial factor graph representing a degree-fc check 
function node by another partial factor graph with k — 2 check nodes of degree 
three and with k — 3 new auxiliary variable nodes. (Here k = 6.) 

graph [27], [28], [29]. (Which in contrast to the above factor 
graph is a multiplicative factor graph.) One can then try to find 
the solution with the help of the max-product algorithm, which 
in the case of Gaussian graphical models is equivalent (up to 
proportionality constants) to the sum-product algorithm.] 

The reason for this being an interesting approach is that 
the behavior of the min-sum algorithm applied to a quadratic- 
cost-function factor graphs is much better understood than for 
other factor graphs. E.g., it is known that if the algorithm 
converges then the solution vector is correct. Moreover, by 
now there are also practically verifiable sufficient conditions 
for convergence [30], [31], [32]. However, the quadratic-cost- 
function factor graphs needed for the above problem are more 
general than the special class of quadratic-cost-function factor 
graphs considered in the cited papers. Of course, one could 
represent the cost function in © by a factor graph within 
this special class (so that the above-mentioned results are 
applicable), however, and quite interestingly, when this cost 
function is represented by a factor graph that is not in this 
special class, then the convergence conditions seem to be 
(judging from some empirical evidence) less stringent. In fact, 
we obtained some very interesting behavior in the context 
of the short-step affine-scaling algorithm where only one 
iteration of the min-sum algorithm was performed per iteration 
of the affine-scaling algorithm. (The min-sum algorithm was 
initialized with the messages obtained in the previous affine- 
scaling algorithm iteration.) 

V. Other Simplifications 

Depending on the used algorithm, there are many small (but 
very useful) variations that can lead - when properly applied 
- to considerable simplifications. E.g., one can replace the 
partial factor graph in Fig. |2|a) by the partial factor graph 
in Fig. |2|b) that contains new auxiliary variable nodes but 
contains only check nodes of degree three [33], [34]. Or, one 
can adaptively modify the set of inequalities that are included 
in the LP formulation [9], [14]. 

VI. Conclusion 

We have presented some initial considerations towards using 
interior-point algorithms for obtaining efficient LP decoders. 
Encouraging preliminary results have been obtained but more 
research is needed to fully understand and exploit the potential 
of these algorithms. 
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